An Oversampling Method for Class Imbalance Problems on Large Datasets

نویسندگان

چکیده

Several oversampling methods have been proposed for solving the class imbalance problem. However, most of them require searching k-nearest neighbors to generate synthetic objects. This requirement makes time-consuming and therefore unsuitable large datasets. In this paper, an method problems that do not neighbors’ search is proposed. According our experiments on datasets with different sizes imbalance, at least twice as fast 8 fastest reported in literature while obtaining similar quality.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

C4.5 Consolidation Process: An Alternative to Intelligent Oversampling Methods in Class Imbalance Problems

In real world problems solved using data mining techniques, it is very usual to find data in which the number of examples of one of the classes is much smaller than the number of examples of the rest of the classes. Many works have been done to deal with these problems known as class imbalance problems. Most of them focus their effort on data resampling techniques so that training data would be...

متن کامل

Consolidation Process: An Alternative to Intelligent Oversampling Methods in Class Imbalance Problems

In real world problems solved using data mining techniques, it is very usual to find data in which the number of examples of one of the classes is much smaller than the number of examples of the rest of the classes. Many works have been done to deal with these problems known as class imbalance problems. Most of them focus their effort on data resampling techniques so that training data would be...

متن کامل

Generative Oversampling for Mining Imbalanced Datasets

One way to handle data mining problems where class prior probabilities and/or misclassification costs between classes are highly unequal is to resample the data until a new, desired class distribution in the training data is achieved. Many resampling techniques have been proposed in the past, and the relationship between resampling and cost-sensitive learning has been well studied. Surprisingly...

متن کامل

An Efficient Numerical Method for a Class of Boundary Value Problems, Based on Shifted Jacobi-Gauss Collocation Scheme

We present a numerical method for a class of boundary value problems on the unit interval which feature a type of exponential and product nonlinearities. Also, we consider singular case. We construct a kind of spectral collocation method based on shifted Jacobi polynomials to implement this method. A number of specific numerical examples demonstrate the accuracy and the efficiency of the propos...

متن کامل

Addressing the Class Imbalance Problem in Medical Datasets

A well balanced dataset is very important for creating a good prediction model. Medical datasets are often not balanced in their class labels. Most existing classification methods tend to perform poorly on minority class examples when the dataset is extremely imbalanced. This is because they aim to optimize the overall accuracy without considering the relative distribution of each class. In thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2022

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app12073424